5 research outputs found

    The effects of F0, intensity and durational manipulations of the perception of stress in dysarthric speech

    Get PDF
    Marking stress plays an important role in conveying meaning and directing the listeners attentionto the important parts of a message. Extensive research has been conducted into howhealthy speakers produce stress, with the key phonetic cues acknowledged as F0, intensity andduration (Bolinger 1961). We also know that speakers with dysarthria experience problemsin marking stress successfully (Lowit et al., 2012, Patel & Campellone, 2009). However, wecurrently lack sufficiently specific information on these deficits and potential compensatorytechniques to allow Speech and Language Therapists (SLTs) to provide effective treatmentmethods to address stress production problems.In order to build an evidence base for intervention, it is essential to first establish therelationship between features of disordered stress production and their perceptual outcomes.In particular, we need to know which phonetic cues (or combinations thereof) are most salientand what degree of change needs to be achieved in order to signal stress effectively to listeners.This project aims to explore these questions in detail by performing perceptual experimentson data from disordered speakers that have been acoustically manipulated, in order to produceguidance to clinicians on how to improve their patients ability to signal stress successfully.We used contrastive stress sentences from 10 speakers with ataxic dysarthria. Each speakerproduced 30 sentences, i.e. 10 sentences (SVOA structures) across 3 conditions (stress on initial(S), medial (O), or final (A) target words). Sentences were perceptually scored by 5 listenersregarding the location of the stress target. We then chose 15 utterances where listeners hadbeen unable to identify the target, five for each of the sentence positions. These utteranceswere subsequently manipulated acoustically by incrementally increasing the F0, intensity andduration of the target words, in accordance with the degree of change observed in the healthycontrol group. In addition, pausing patterns as well intonation contours were altered. Themanipulated utterances were played to 50 listeners to evaluate what degree and combinationof alteration resulted in correct identification of the stress target.We will report on the patterns of impairment observed in the disordered speech samples,as well as the impact of the above manipulations on listener accuracy. This will provide informationfor future studies on stress production regarding focus of analysis, as well as guideclinicians on how best to address deficits in this area in their patients

    Prosodic feature extraction for assessment and treatment of dysarthria

    Get PDF
    Dysarthria, a neurological motor speech disorder caused by lesions to the central and peripheral nervous system, accounts for over 40% of neurological disorders referred to pathologists in 2013[1]. This affects the ability of speakers to control the movement of speech production muscles due to muscle weakness. Dysarthria is characterised by reduced loudness, high pitch variability, monotonous speech, poor voice quality and reduced intelligibility [2]. Current techniques for dysarthria assessment are based on perception, which do not give objective measurements for the severity of this speech disorder. There is therefore a need to explore objective techniques for dysarthria assessment and treatment. The goal of this research is to identify and extract the main acoustic features which can be used to describe the type and severity of this disorder. An acoustic feature extraction and classification technique is proposed in this work. The proposed method involves a pre-processing stage where audio samples are filtered to remove noise and resampled at 8 kHz. The next stage is a feature extraction stage where pitch, intensity, formants, zero-crossing rate, speech rate and cepstral coefficients are extracted from the speech samples. Classification of the extracted features is carried out using a single layer neural network. After the classification, a treatment tool is to be developed to assist patients, through tailored exercises, to improve their articulatory ability, intelligibility, intonation and voice quality. Consequently, this proposed technique will assist speech therapists in tracking the progress of patients over time. It will also provide an acoustic objective measurement for dysarthria severity assessment. Some of the potential applications of this technology include management of cognitive speech impairments, treatment of speech difficulties in children and other advanced speech and language applications

    Acoustic-based assistive technology tools for dysarthria managment

    No full text
    Incorrect date (2019) Date of Award is 2020.The research presented in this thesis addresses the concepts of application of important digital signal processing algorithms in the detection and treatment of dysarthria, a neurological motor speech disorder. The novel algorithms presented in this thesis include a silence, unvoiced and voiced segmentation technique for dysarthric speech based on linear prediction error variance (LPEV), an automatic diadochokinetic (DDK) analysis and segmentation scheme for dysarthric speech, the application of speech processing algorithms in the extraction of prosodic, voice quality, pronunciation and wavelet features for the detection and severity classification in dysarthric speech and the modification of dysarthric speech features using speech enhancement techniques to improve the intelligibility of dysarthric speech in a stress production exercise for the treatment of dysarthria.In particular, an improved silence, unvoiced and voiced segmentation technique for dysarthric speech is proposed. This method is an enhanced technique that makes use of a two-layer segmentation approach which combines the short-time-energy (STE) and LPEV to distinctly differentiate between the silence and voiced segments despite the reduced/inconsistent intensity, pauses, voice breaks and slow speech rate experienced in dysarthric speech. Including the LPEV into the segmentation process has proved to be advantageous in eliminating segmentation errors due to the similarity observed between the STE profiles of the silence and voiced segments in dysarthric speech. The experimental results have shown that this segmentation method is also effective and efficient in reducing the effects of artefacts introduced in dysarthric speech.;A novel automatic DDK analysis scheme is proposed in this research to extract individual DDK syllables and analyse them for consistency. This method is based on a speaker-specific moving average threshold (rather than a fixed threshold) which addresses the varying intensities in the DDK sounds produced by speakers with dysarthria. This method also addresses the challenge of intra-syllable breaks introduced in dysarthric DDK syllables using a minimum distance merging approach. In addition, the algorithm analyses the segmented DDK syllables by calculating the individual DDK rates and their variance in order to measure the DDK syllable production consistency. The high accuracy of the proposed method is tested and verified using both dysarthric and healthy controlled databases.Three novel schemes for automatic detection and severity classification of the dysarthric speech are also proposed in this research. One extracts an extended speech feature called centroid formant (which is a representation of energy concentration in the frequency spectrum) and classifies these centroid formants using neural network classifiers for the detection of dysarthria. The centroid formant-based detection scheme also forms the backbone for the development of the second and more robust detection scheme which combines centroid formants with prosodic, voice quality, pronunciation and wavelet features for more efficient classification. A third scheme is developed specifically for the classification of dysarthria into three severity levels using the same features as in the second scheme. The efficiencies of these detection and severity classification schemes are evaluated by calculating the accuracy, sensitivity and specificity of the classifiers.The effects of modification of prosodic cues used in stress production on the ability of listeners to correctly identify the position of the stressed word in sentences are also investigated in this research. This investigation is focused on the three prosodic cues used by healthy controlled speakers in stress production; namely intensity, duration and fundamental frequency. These three features are modified acoustically and presented to untrained listeners in an aim to evaluate the effects of the individual and combined modifications on the listeners' perception. The findings of this investigation will help clinicians, including speech and language therapists, make an informed decision on the prosodic feature to focus on during stress production exercises for the management of dysarthria.Finally, the dysarthria management schemes proposed in this research are developed into user-interactive tools in MATLAB from which speaker-specific information and reports can be generated and downloaded for progress monitoring and further analytical purposes.The research presented in this thesis addresses the concepts of application of important digital signal processing algorithms in the detection and treatment of dysarthria, a neurological motor speech disorder. The novel algorithms presented in this thesis include a silence, unvoiced and voiced segmentation technique for dysarthric speech based on linear prediction error variance (LPEV), an automatic diadochokinetic (DDK) analysis and segmentation scheme for dysarthric speech, the application of speech processing algorithms in the extraction of prosodic, voice quality, pronunciation and wavelet features for the detection and severity classification in dysarthric speech and the modification of dysarthric speech features using speech enhancement techniques to improve the intelligibility of dysarthric speech in a stress production exercise for the treatment of dysarthria.In particular, an improved silence, unvoiced and voiced segmentation technique for dysarthric speech is proposed. This method is an enhanced technique that makes use of a two-layer segmentation approach which combines the short-time-energy (STE) and LPEV to distinctly differentiate between the silence and voiced segments despite the reduced/inconsistent intensity, pauses, voice breaks and slow speech rate experienced in dysarthric speech. Including the LPEV into the segmentation process has proved to be advantageous in eliminating segmentation errors due to the similarity observed between the STE profiles of the silence and voiced segments in dysarthric speech. The experimental results have shown that this segmentation method is also effective and efficient in reducing the effects of artefacts introduced in dysarthric speech.;A novel automatic DDK analysis scheme is proposed in this research to extract individual DDK syllables and analyse them for consistency. This method is based on a speaker-specific moving average threshold (rather than a fixed threshold) which addresses the varying intensities in the DDK sounds produced by speakers with dysarthria. This method also addresses the challenge of intra-syllable breaks introduced in dysarthric DDK syllables using a minimum distance merging approach. In addition, the algorithm analyses the segmented DDK syllables by calculating the individual DDK rates and their variance in order to measure the DDK syllable production consistency. The high accuracy of the proposed method is tested and verified using both dysarthric and healthy controlled databases.Three novel schemes for automatic detection and severity classification of the dysarthric speech are also proposed in this research. One extracts an extended speech feature called centroid formant (which is a representation of energy concentration in the frequency spectrum) and classifies these centroid formants using neural network classifiers for the detection of dysarthria. The centroid formant-based detection scheme also forms the backbone for the development of the second and more robust detection scheme which combines centroid formants with prosodic, voice quality, pronunciation and wavelet features for more efficient classification. A third scheme is developed specifically for the classification of dysarthria into three severity levels using the same features as in the second scheme. The efficiencies of these detection and severity classification schemes are evaluated by calculating the accuracy, sensitivity and specificity of the classifiers.The effects of modification of prosodic cues used in stress production on the ability of listeners to correctly identify the position of the stressed word in sentences are also investigated in this research. This investigation is focused on the three prosodic cues used by healthy controlled speakers in stress production; namely intensity, duration and fundamental frequency. These three features are modified acoustically and presented to untrained listeners in an aim to evaluate the effects of the individual and combined modifications on the listeners' perception. The findings of this investigation will help clinicians, including speech and language therapists, make an informed decision on the prosodic feature to focus on during stress production exercises for the management of dysarthria.Finally, the dysarthria management schemes proposed in this research are developed into user-interactive tools in MATLAB from which speaker-specific information and reports can be generated and downloaded for progress monitoring and further analytical purposes

    The effects of acoustic modification of fundamental frequency, intensity and duration on the perception of stress in dysarthria speech

    No full text
    Stress marking plays an essential role in conveying meaning and drawing listener's attention to specific parts of a message. Past research has shown that healthy speakers mark stress using three main acoustic cues; pitch, intensity and duration. We are also aware that dysrathric speakers experience problems in manipulating these acoustic cues when marking stress. The relationship between acoustic cues and listeners perception is vital in the development of a computer-based tool that aids the Speech and Language Therapists (SLTs) in providing effective treatment to dysarthric speakers. The aim of this study is, therefore, to investigate the acoustic cues deficiencies in dysarthric speech and the potential compensatory techniques needed for effective treatment. In this study, we investigate the relationship between acoustic features of dysarthric stress production and what is perceived by listeners. We also investigate which acoustic cues (or combination thereof) are the most significant and what degree of change needs to be achieved in order to mark stress effectively to listeners. In this study, we perform perceptual experiments on dysarthric speech which involves acoustic modification of stress marked sentences from 10 speakers with Ataxic dysarthria. Each speaker produced 30 sentences using the 10 Subject-Verb-Object-Adjective (SVOA) structured sentences across three stress conditions. These stress conditions are stress on initial (S), medial (O) and final (A) target words respectively. The sentences were perceptually scored by 5 untrained listeners based on the location of the stress target. We then identified sentences where listeners were not able to identify the target word and chose 15 of them across the 3 sentence conditions for our study. To effectively measure the deficiencies in Dysarthria speech, the acoustic features (pitch, intensity and duration) of the target words, in the selected sentences, were modified incrementally in accordance to the degree of change observed in age and gender matched healthy control group. The modified sentences were replyed back to 50 listeners to evaluate what degree and combination of modifications resulted in correct identification of the stress target. We present the patterns of deficiency observed in dysarthric speech when marking stress as well as the effects of modifying the acoustic cues on the ability of the listeners to identify the stress target. We will also report on the effects of combining these acoustics features modifications (fundamental frequency, intensity and duration). The results of this study will provide information and guidance for future studies on stress production with regards to focus of analysis. This is will also provide an evidence base for intervention guiding Speech and Language Therapists (SLTs) on how best to address the deficits in stress marking when treating their patients. With regards to other speech disorders, the results of this study can also offer some guidance in developing acoustic tools for monitoring and managing of patients during and after treatment

    SAR sea ice image segmentation using watershed with intensity-based region merging

    No full text
    A new approach for Synthetic Aperture Radar (SAR) based sea ice image segmentation for the retrieval of floe size distribution (FSD) is proposed. This method consists of three stages. The first stage involves the pre-processing of the SAR image to reduce the speckle noise in the image by median filtering. The second stage involves an initial segmentation of the image using the Watershed transform. The third stage involves a region merging process based on the difference function of the mean intensity of adjacent regions. Adjacent regions are defined by the region adjacency graph based on a minimum distance of 1 pixel between two adjacent regions. A threshold value is usually set for the merging process, if the difference in mean intensity of adjacent regions is less than the threshold, the adjacent regions will be merged. Experimental results have shown the efficacy of the proposed method in effective segmentation of SAR ice images
    corecore